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Abstract — We consider a low-complexity version of the Com- 
pute and Forward scheme that involves only scaling, offset 
(dithering removal) and scalar quantization at the relays. The 
proposed scheme is suited for the uplink of a distributed antenna 
system where the antenna elements must be very simple and are 
connected to a joint processor via orthogonal perfect links of 
given rate R . We consider the design of non-binary LDPC codes 
naturally matched to the proposed scheme. Each antenna element 
performs individual (decentralized) Belief Propagation decoding 
of its own quantized signal, and sends a linear combination of 
the users' information messages via the noiseless link to the 
joint processor, which retrieves the users' messages by Gaussian 
elimination. The complexity of this scheme is linear in the coding 
block length and polynomial in the system size (number of relays). 

Index Terms — Distributed Antenna Systems, Relay Networks, 
Compute-and-Forward, Network Coding. 

I. Introduction 

Powerful multi-media capable devices such as smartphones 
and tablets demand higher and higher data rates. Without in- 
creasing the system bandwidth (a scarcely available commod- 
ity) and the geometry of the cells, this demand is accomplished 
by more and more sophisticated signal processing. In turn, 
this trend makes energy efficiency one of the most pressing 
problems in modern wireless cellular networks. It is known 
that the power consumption of conventional tower-mounted 
macro-BS in today's high-speed data oriented systems (e.g., 
3G HSDPA/HSUPA and 4G LTE QJ) contributes for a large 
fraction of the operational costs of wireless cellular opera- 
tors G). Furthermore, the environmental impact of cellular 
networks is striking: for example, the carbon footprint of 
downloading a 1.5GB video file (typical iTunes movie size) 
from a 3G/HSDPA tower is ~ 22 kg-C0 2 e (3j, comparable 
with driving an SUV for 40 miles and significantly larger than 
downloading the same amount of bits from a wired link (~ 
0.75 22 kg-C0 2 e). 

An alternative approach to the conventional BS architecture 
consists of a "cloud" BS, where a large number of simple 
single-antenna elements are distributed over a wide area and 
connected to a central processor via wired backbone links. The 
distributed antenna elements perform demodulation, Analog to 



Digital Conversion (ADC), and possibly some decentralized 
decoding operation, and ship their processed observations to 
the central processor, which performs some form of joint de- 
coding. In this way, any User Terminal UT finds a BS antenna 
element at small distance with high probability, thus allowing 
for a very dense spatial reuse. In terms of power consumption, 
many small terminals are significantly more power efficient 
than a single large one (e.g., no need for power-consuming 
cooling subsystems). Furthermore, additional power savings 
can be obtained by switching off the antenna elements not in 
the vicinity of "active" users. Since in practice cellular users 
have very large "off" duty cycles, a correspondingly large 
fraction of antenna elements is switched off at any given time. 

An uplink distributed antenna system with L users and L 
single-antenna terminals, connected to a central processor via 
wired links of fixed rate R forms a three layer relay network 
with L sources, L relays and one destination. This scenario 
was investigated in Q, where Decode-and-Forward (DF) 
and Compress-and-Forward (CF) strategies were analyzed in 
closed form for the so-called Wyner model (see Section |V) . 
The DF strategy makes the central processor extremely simple 
but it is not compliant with the basic goal of having very 
low -complexity distributed antenna elements. The CF strategy 
of p) is in fact a special case of the so-called "quantize- 
remap and forward" scheme of |5|, also referred to as "noisy 
network coding" in J6). In this case, the relays (i.e., the 
distributed antenna elements) just quantize the received signal 
and forward the quantization samples^ However, the central 
processor is required to jointly decoding all users from the 
quantized observations. 

An alternative approach consists of Compute and Forward 
(CoF), recently developed in |8), (9j, and applied to the Wyner 
model in [ 10 1 . CoF seeks a tradeoff between "quenching" the 
noise at the relays (as DF) and forwarding "noisy" observa- 
tions without making local decision (as CF). In this scheme, 
the users encode their information messages using the same 
lattice code, and each relay decodes a linear combination 

'The information-theoretic vector quantization of [5], |6| can be replaced 
by scalar quantization with a fixed-gap performance degradation [7|. 



with integer coefficients of the lattice codewords. These linear 
combinations are mapped onto linear combination of the 
messages defined on a suitable finite field and forwarded to 
the central processor. If no error at the relays occurs and the 
overall L x L linear system has rank L, the central processor 
can decode the user messages in polynomial time by Gaussian 
elimination, as in standard linear network coding (TTJ. 

In this paper we focus on the uplink of a distributed 
antenna system and consider a CoF architecture suited for low- 
complexity low-power system implementation. The proposed 
scheme is motivated by the observation that the main bottle- 
neck of a digital receiver is the Analog to Digital Conversion 
(ADC), which is costly, power-hungry and does not scale with 
Moore's law. Rather the number of bit per second produced 
by an ADC is roughly a constant that depends on the power 
consumption fl2|, 1 13 1. Therefore, it makes sense to consider 



dimensional lattices 

A s = {x — npz : z G Z} 

A c = {x = kz : z G Z}. (1) 

and define the constellation set S = A c fl V s , where V s is 
the Voronoi region of A s , i.e., the interval [— Kp/2, Kp/2). 
The modulation mapping M : Z p — > S is defined by 
v = M(u) = [ng(u)] mod A s . The inverse function Af _1 (-) 
is referred to as the demodulation mapping, and it is given by 
u = A/ _1 (w) = k] mod pZ) with v € S. 

Let G G Z™ x,£ be a matrix of rank k. The linear code C over 
Z p with block length n, dimension k and rate R = — log(p) 
(in bit/symbol) generated by G is the set of codewords C = 
{c = Gw : w e Zp}. 



the ADC transformation as part of the channel transforma- 
tion, which becomes discrete-output in nature. We propose 
a Quantized CoF (QCoF) scheme based on one dimensional 



B. Modulation and coding for QCoF 

Consider the (real-valued) L-user Gaussian multiple ac- 
cess channel with inputs {xij : i — l,...,n} for I = 
output {yi : i — 1, . . . , n} and coefficients h = 



lattice modulation and linear codes over Z p (p being a prime . , , ( fo L ) T g R L , defined by 



number). Each relay recovers a noisy linear combination of the 
user codewords, where the noise is discrete and additive over 
Z p . Therefore, the resulting computation rate coincides with 
the capacity of a single-user discrete additive noise channel, 
achieved by linear codes over Z p (141. For large p, QCoF 
achieves the computation rate of unquantized CoF within the 
shaping loss, about 0.25 bits per real symbol. 

We also develop practical Low-Density Parity-Check 
(LDPC) code constructions for the additive noise channel over 
Zp, inspired by the work of p5| . We show results for an 
ensemble of regular and irregular Repeat-Accumulate (RA) 
protograph-based codes over Z p , the performance of which 
is evaluated via the EXIT-chart method. More refined code 
design (e.g., based on the "ARA" code structure p6) ) is 
expected to provide further improvements for a wider range 
of system parameters. 

We compare the spectral efficiency of QCoF with the bench- 
marks provided by DF, CF, and CoF, over the standard Wyner 
model. The proposed scheme shows competitive performance 
with respect to much more involved schemes, which rely on 
infinite ADC resolution. Also, we show that unequal power 
allocation can be used to reduce the "non-integer" error of 
QCoF and further improve its performance. 

II. Quantized compute and forward 
A. Definitions 

Let Zp = Z mod pZ denote the finite field of size p, with p 
a prime number, denote addition over Z p , and g : Z p — > M. 
be a function that maps the elements of Z p into the points 
{0, 1, ...,p - 1} c R. 

For a lattice A, we define the lattice quantizer Qa(x) = 
argmin^ gA {||x — A||}, the Voronoi region V = {x G R™ : 
Qa(x) = 0} and [x] mod A = x — Qa(x). Let p be a 
prime integer and k E IR+. We consider the two nested one- 



1, ... ,71 



(2) 



where the Zi's are i.i.d. ~ jV(0, 1). For this channel, we 
consider a modification of the CoF scheme of (SJ that includes 
scalar quantization at the receiver as part of the channel. In the 
proposed scheme, all users encode their information messages 
{wi € Z p : £ = 1, . . . , L} using the same linear code C over 
Zp, and produce their channel inputs according to 



mod A, 



(3) 



for i — 1, ...,n, where cn is the i-th component of code- 
word C£ = Gwf and the de/s are i.i.d. dithering symbols 
~ Uniform(V s ), known at the receiver. The channel inputs 
xi i are uniformly distributed over V s and have second moment 
SNR^EUxf.^ 2 } =«V/12. 

The receiver's goal is to recover a linear combination 
c = ^ qtct of the transmitted users' codewords, for some 
coefficients qi £ Z p . For this purpose, the receiver selects 
the integer coefficients vector a = (ai,...,ai) T G Z L and 
produces the sequence of quantized observations 



= M~ 



mod A, 



(4) 



for i = l,...,n. Letting u = (m, . . . , u„) T , and using 
Lemma [T] at the end of this section, we have that the con- 
catenation of |2]) and Q is equivalent to 

L 

q e c^ ® Z, (5) 

' 1=1 

with qi = 5 _1 ([af] mod pZ), and where the discrete additive 
noise vector z has statistics given in Section [TTT| 

Notice that u is obtained by componentwise analog op- 
erations (scaling and translation) and scalar quantization, in 




sawtooth transformation 



Fig. 1. Implementation of Qa c (■) followed by modulo A s using an analog 
transformation and a finite levels scalar quantizer. 



fact, the scalar quantization by A c followed by the modulo 
A s operation can be obtained by concatenating a sawtooth 
memoryless transformation with a standard p-points scalar 
quantizer, as shown in Fig. [T] The components ui of u 
can be sent to the central processor at the rate of log(p) 
bit/symbol. Alternatively, the linear combination c can be 
locally decoded, and the corresponding linear combination of 
the user messages, w = qew? can be sent to the central 
processor at rate k/n bits/symbol. 

Letting D : Z™ — > C denote a decoder for C, we define 
the average error probability as P e (h,a) = P(D(u) 7^ c), 
for fixed coefficients h, a, averaged over the messages, the 
channel noise and the dithering signals. A computation rate 
R(h, a) for the QCoF scheme described above is achiev- 
able if there exist a sequence of (n, k) codes C such that 
liminfn^oo £ log(p) > i?(h, a) and lim P e (h,a)=0. 

Lemma 1: For ii£ G Z p , let u = ©« =1 qgu^ for some coef- 
ficients qe € lip. Also, set vg = M(ui) and v = X^=i a i v t 
mod A s for some a f eZ such that q e = <7 _1 ([a^] mod pL). 
Then, we have u — M~ l {v). □ 

III. Achievable computation rate 

First we examine the marginal statistics of the discrete noise 
z in |5]). Rewriting Q by letting vi t i = M(ce.i) and by 
omitting the subscript i for brevity, we have 



mod A. 



Qa c (ay - ^2 a ? d > 
i=i 

L L 

Qa c (a ^2 hzxi + az a id. 
t=i i=i 

L L 

Qa c ( ^2 aiVl + y^( a ^ - ai)xt + 
i=i t=\ 

L L 

qa c ( ~^2 aeve + ee + 



mod A, 



mod A, 



i=\ 



mod A s 



(6) 



where we let ei — (ahg — ai)xi and where we used the fact 
that, by a i{ v t + di) an d 12e=i a t x t differ by some 



point of A s and, for any y e R and A E A s , we have Qa c (j/ + 
X) = QaM- 

By the well-known Crypto-Lemma fl7) , the non-integer 
error term ei is statistically independent of vg and is uniformly 
distributed in [— v / 3SNR(a/if — ai), v / 3SNR(a/i^ — ag)). The 
overall error and noise term is e = J2e=i e ? + az > w ^ m mean 
zero and variance 



SNR||ah-a|r + a 2 . 



(7) 



The components of the effective noise z in Q are distributed 



as 



z = M- 1 ([Q Ao (£)] modA s ) 



(8) 



and have pmf that can be calculated numerically, and it is well 
approximated by assuming e ~ Af(0, erf). 

Since decoding the linear combination Ylt=i 1i c t fr° m the 
noisy discrete observation |5]) is equivalent to decoding the 
linear code C over the additive noise discrete channel u = 
c © z, we have: 

Theorem 1: For given h, a, a and modulation order p, the 
largest achievable computation rate of QCoF is equal to the 
capacity of the discrete additive noise channel y — iffiz, given 
by R(h,a,a) =logp-H(z). □ 

The QCoF computation rate can be maximized by minimiz- 
ing the entropy H(z) with respect to a g Z L , a (as a function 
of h, SNR and p). This is generally a difficult problem. Instead, 
we resort to the suboptimal (but much simpler) problem of 
minimizing the variance cr 2 in (|7j. First, it is immediate to 
see that the optimal a is given by [8] 

SNRh T a 



l + SNR||h|| 2 ' 



Replacing a* into ([7J, we obtain 



SNR 



lall 2 - 



SNR|h T a| 2 
l + SNR||h|| 2 



= a T (SNR^I + hh 7 ) a 



(9) 



The quadratic form in |9]) is positive definite for any SNR < 
oo, since the matrix (SNR -1 ! + hh T ) _1 has eigenvalues 



Ai 



SNR 



l + SNR||h|| 



SNR. 



By Cholesky decomposition, there exists a lower triangular 
matrix L such that cr 2 = ||L T a|| . It follows that the problem 
of minimizing cr 2 over a € Z is equivalent to finding the 
"shortest lattice point" of the L-dimensional lattice generated 
by L T . This can be efficiently obtained using the LLL algo- 
rithm JTSJ, possibly followed by Phost or Schnorr-Euchner 
enumeration (see [19]) of the non-zero lattice points in a 
sphere centered at the origin, with radius equal to the shortest 
vector found by LLL. 
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Fig. 2. Computation rates for three-user Gaussian MAC with coefficients 
h = [f , 0.75, — y/2] and ADC level, p = 3, p = 7, p = 17, or 251. 



IV. Code construction for QCoF 

Given the equivalence of the QCoF computation rate with 
the capacity of the additive noise channel over Z p with 
noise z, stated in Theorem [T] our goal is to design low- 
complexity capacity-approaching linear codes for this channel. 
To this purpose, we consider ensembles of protograph-based 
Low Density Parity-Check (LDPC) codes p0) over Z p . In 
particular, we consider a random coding ensemble where 
the non-zero elements in the parity-check matrix are chosen 
randomly and uniformly chosen from Z* (non-zero elements 
of Z p ). In order to optimize and evaluate the asymptotic 
(large n) performance of these codes we used the Gaussian 
approximation to density evolution known as "EXIT chart", as 
given in p?j . Messages generated by the Belief Propagation 
decoder, in the form of log-likelihood ratios, are modeled as 
(p— 1) -dimensional correlated Gaussian vectors A with mean 
v/2 and covariance matrix X with elements [X]jj = v for 
% = j and [S]i t j — v/2 for i ^ j. Letting V the code variable 
corresponding to the edge message A, we define the mutual 
information function 



J(i/)4 j(y ; A) = l-E[log p (l 



p-1 

i=l 



(10) 



The EXIT chart analysis tracks the evolution of the mutual 
information I(V; A) along the protograph edges (2TJ. A proto- 
graph is a bipartite multigraph with N c "check" nodes and N v 
"variable" nodes, described by a base matrix B = [bi j] whose 
element bjj indicate the multiplicity of the edges connecting 
the variable node Vj to the check node Cj. For instance, 
the protograph corresponding to the Repeat-Accumulate (RA) 
code ensemble [22] of rate 1/2 has base matrix B = [4, 2]. 

A. Numerical results 

As an example, Fig. [2] shows the QCoF computation rate 
as a function of SNR for a L = 3 users case. For large 
p = 251, QCoF approaches computation rate of CoF within 



the shaping loss of w 0.25 bits/symbol. For lower SNR, 
QCoF with smaller values of p yields satisfactory performance 
with lower complexity. We also show the computation rates 
achieved by a family of RA codes over Z p with code rates 
(1/2,2/3,4/5) for different p values, which performs quite 
well as shown in Fig. [2] As in the binary case, we can expect 
that carefully optimized code ensembles can closely approach 
the information theoretic computation rate at all SNRs. For 
example, we designed a simple protograph-based Irregular 
Repeat-Accumulate (IRA) code over Z7 with rate 1/2 defined 
by the base matrix 



B 



10 11110 
01110110 
110 10 11 
1 2 1 1 



(11) 



Higher coding rates 2/3 and 4/5 were obtained by the check 
node merging technique in |23|. Even this simple IRA code 
design shows noticeable improvement with respect to its RA 
counterpart, and almost achieves the theoretical limit at rates 
2/3 and 4/5. Charts like Fig. [2] can be used as a guideline to 
select the modulation order p and coding rate. For instance, 
at SNR — 25dB it is reasonable to choose p = 7 and an IRA 
code with rate 1/2. 

V. Distributed antenna system 

We consider a distributed antenna system with L users and 
L single-antenna terminals connected to a central processor 
via wired links of fixed rate R a , as introduced in BJ. Also, 
we assume that each £-th relay is equipped with a p-level 
ADC, as shown in Fig. [T[ implementing fij, followed by a 
decoder for the linear code C, producing the estimated linear 
combination t( of the user codewords. For simplicity and for 



the sake of comparison with [101, we consider the symmetric 



Wyner model for which the the received signal at the £-th relay 
is 



Vl,; 



(12) 



where — & mod L and 7 G [0,1]. The achievable rate of 
QCoF is 



R = min < max R(h, a, a), Rq 



(13) 



where h = (7, 1,7) T and R(h, a, a) is given by Theorem [T] 
The central processor receives via the noiseless links the L 
linear combinations of the user messages and, provided that 
no decoding error occurred at the relays and that the overall 
L x L linear system matrix has rank L, it solves for the user 
using Gaussian elimination. In this example, thanks to the 
banded structure of the channel matrix, the system matrix is 
guaranteed to have rank L. Using LDPCs with iterative Belief 
Propagation decoding, the overall complexity of the receiver 
scales linearly with the coding block length n (complexity 
0(1) per decoded information bit), and polynomially with the 
system size L. For the structured banded channel matrix of 
this example, the complexity per decoded information bit is 
also O(l) with respect to L. 



It is known that the performance of CoF (and therefore o 
QCoF) is quite sensitive to the channel coefficients, due to th 
non-integer penalty term. More favorable channel coefficient 
can be obtained by using a power allocation (PA) strateg} 
to reduce the impact of non-integer error term, which i 
simpler than the superposition strategy in [10]. Odd-numbere< 
UTs transmit at power {3P and even-numbered UTs transmi 
at power (2 - r3)P, for /3 e [0, 1]. The role of odd- an 
even-numbered UTs is reversed at alternate time slots, sue! 
that each UT satisfies its individual power constraint o 
average. Accordingly, the effective coefficients of the chan 
nel for odd-numbered and even-numbered relays are h Q = 

[ 7 \/2^?,V?,7^2^3] and h e = [ 7 ^,V2^,7^ 
For a given 7, the parameter j3 £ [0, 1] can be optimize* 
to make the effective channels better suited for the intege 
approximation. In this case, the rate achieved by QCoF wit 
PC is R = mm{i?', R } where 

R' = max min < maxi?(h Q , a, a), maxi?(h e , a, a) > . 
i8e[o,i] [_ a > Q & a J 

Notice that the odd- and even-numbered relays can opti- 
mize their own equation coefficients independently, but the 
optimization with respect to (3 is common to both and the 
computation rate is the minimum computation rate over all 
the relays, since the same code C is used across all users. In 
Fig. [3] we show the system performance for Rq — 2 bits, 
p = 7 or 251 (e.g., about 3 or 8 bits ADC) and compare 
various relaying strategies. The achievable rates for DF, CF, 
and CoF are as in [10], for the sake of comparison. 

Not surprisingly, QCoF only pays the shaping gain with 
respect to CoF when p — 251. Also, QCoF with p = 7 
shows the satisfactory performance with lower complexity. 
The PA strategy significantly reduces the integer approxima- 
tion penalty and improves the achievable rate in the middle 
range of 7. Notice also that Fig. [3] does not provide a fair 
comparison, since the impact of a finite resolution ADC is 
not considered in DF, CF, and CoF. 
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